Regular Expressions

Regular expression can be used in Find operations to refine your searches. The following characters and sequences are treated as expressions when the Regular Expressions checkbox on the Find dialog box is checked.

Special Characters

. (period) Matches any character, except the end-of-line.

^ (caret) Matches the actual beginning-of-line position or the preceding line-delimiter character pair (also see [^] below for usage within a character class definition).

$ (dollar) Matches the end-of-line position.

| (stile) Specifies alternation (the OR operator), so that an expression on either side can match. Precedence is from left-to-right, as encountered in the expression.

? (question mark) Specifies that zero or one match of the preceding sub-pattern is allowed. Cannot be used with a Tag.

* (asterisk) Specifies that zero or more matches of the preceding sub-pattern are allowed.

\ (backslash) The character following this will be treated as a literal value rather than being interpreted as a special character when it is any of the above special character; also the following: [ - +

Escaped Characters

\b A word boundary. Find the expression as the start or end of a word. For example, "na\b" would only find the last 'na' in the word 'banana'.

\c Case-sensitive search. Without the \c operator, the default is to ignore case when matching.

\n Linefeed (or newline)

\q Double-quote mark ("): example: "\qHello\q".

\r Carriage-return character

\s Shortest match character: The \s flag causes the shortest matching string to be returned, rather than the longest (the default). For example, when searching for the mask "abc.*abc" in "abcdabcabc", the default setting would return position 1 and length 10. With the \s switch set, it returns position 1 and length 7. This option may cause a slight increase in processing time.

\t Horizontal tab character

\x## Hex character code: Indicates that an ASCII code follows, given by two hexadecimal digits. For example, \xFF = ANSI 255. XX must be in the range 0 through 255.

Character Classes

[ ] (square brackets) Identifies a user-defined class of characters, any of which will match: [abc] will match a, b, or c.

[-] (hyphen) The hyphen identifies a range of characters to match. For example, [a-f] will match a, b, c, d, e, or f.

Characters in an individual range must occur in the natural order as they appear in the character set. For example, [f-a] will match nothing.

Lists of characters, and one or more ranges of characters, may be intermixed in a single class definition. The start and end of a range may be specified by a literal character, or one of the \ backslash escape sequences:

\\ \- \] \e \f \n \q \r \t \v \x##

Multiple ranges in a class are valid. For example, [a-d2-5] matches a, b, c, d, 2, 3, 4, or 5.

When the hyphen is escaped, it is treated as a literal. For example, [a\-c] is a list, not a range, and matches a, -, or c due to the \ backslash escape sequence.

[^] (caret) When the caret appears as the first item in a class definition, it identifies a complemented class of characters, which will not match. For example, [^abc] matches any character except a, b, or c.

A range can also be specified for the complemented class. For example, [^a-z] matches any character except a through z.

A caret located in any position other than the first is treated as a literal character.